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P_l . Abstract 

In this paper, we derive Hybrid, Bayesian and Marginalized Cramer Rao Lower Bounds (HCRB, 
BCRB and MCRB) for the single measurement vector and multiple measurement vector Sparse Bayesian 
Learning (SBL) problem of estimating compressible vectors and their prior distribution parameters. We 
assume the unknown vector to be drawn from a compressible Student-i prior distribution. We derive 
CRBs that encompass the deterministic or random nature of the unknown parameters of the prior 
| distribution and the regression noise variance. We extend the MCRB to the case where the compressible 

vector is distributed according to a general compressible prior distribution, of which, the generalized 
Pareto distribution is a special case. We use the derived bounds to uncover the relationship between the 
compressibility and Mean Square Error (MSE) in the estimates. Further, we illustrate the tightness and 



utility of the bounds through simulations, by comparing them with the MSE performance of two popular 
SBL-based estimators. It is found that the MCRB is generally the tightest among the bounds derived and 
that the MSE performance of the Expectation-Maximization (EM) algorithm coincides with the MCRB 
for the compressible vector. Through simulations, we demonstrate the dependence of the lower bounds 
as well as the MSE performance of SBL based estimators on the compressibility of the vector for several 
values of the number of observations, the number of measurements, and at different signal powers. 

Index Terms 

Sparse Bayesian learning, mean square error, Cramer Rao lower bounds, Student-f, compressible 
priors, expectation maximization. 
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I. Introduction 

Recent results in the theory of compressed sensing have generated immense interest in sparse vector 
estimation problems, resulting in a multitude of successful practical signal recovery algorithms. In several 
applications such as processing of natural images, audio and speech, signals are not exactly sparse, but 
compressible, i.e., the magnitudes of the sorted coefficients of the vector follow a power law decay 
[1]. In [2] and [3], the authors show that vectors drawn from a special class of probability distribution 
functions (pdf) known as compressible priors result in compressible vectors. Assuming that the vector to 
be estimated (henceforth referred to as the unknown vector) has a compressible prior distribution enables 
one to formulate the compressible vector recovery problem in the Bayesian framework, thus allowing 
the use of Sparse Bayesian Learning (SBL) techniques [4]. In his seminal work, Tipping proposed an 
SBL algorithm for estimating the unknown vector, based on the Expectation Maximization (EM) and 
McKay updates [4]. Since these update rules are known to be slow, fast update techniques are proposed 
in [5]. A duality based algorithm for solving the SBL cost function is proposed in [6], and l\ — £2 
based reweighting schemes are explored in [7]. Such algorithms have been successfully employed for 
image/visual tracking [8], neuro-imaging [9], [10], beamforming [11], and joint channel estimation and 
data detection for OFDM systems [12]. 

Many of the aforementioned papers study the complexity, convergence and support recovery properties 
of SBL based estimators (e.g., [5], [6]). In [3], the general conditions required for the so-called instance 
optimality of such estimators are derived. However, it is not known whether these recovery algorithms 
are optimal in terms of the Mean Square Error (MSE) in the estimate or by how much their performance 
can be improved. In the context of estimating sparse signals, Cramer Rao lower bounds on the MSE 
performance are derived in [13]— [15]. However, to the best of our knowledge, none of the existing works 
provide a lower bound on the MSE performance of compressible vector estimation. Such bounds are 
necessary, as they provide absolute yardsticks for comparative analysis of estimators, and may also be 
used as a criterion for minimization of MSE in certain problems [16]. In this paper, we close this gap in 
theory by providing Cramer Rao type lower bounds on the MSE performance of estimators in the SBL 
framework. 
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As our starting point, we consider a linear Single Measurement Vector (SMV) SBL model given by 

y = *x + n, (1) 

where the observations y G R N and the measurement matrix $ G M ArxL are known, and x G R L 
is the unknown sparse/compressible vector to be estimated [17]. Each component of the additive noise 
n G 1 N is white Gaussian, distributed as Af(0,a 2 ), where the variance a 2 may be known or unknown. 
The SMV-SBL system model in (1) can be generalized to a linear Multiple Measurement Vector (MMV) 
SBL model given by 

T = *W + V. (2) 

Here, T G R NxM represents the M observation vectors, the columns of W G R LxAI are the M 
sparse/compressible vectors, and each column of V G W NxM is modeled similar to n in (1) [18]. Since 
the M vectors in W have a common underlying compressible distribution, (1) is a special case of (2) 
for M = 1. 

In typical compressible vector estimation problems, 3> is underdetermined (N < L), rendering the 
problem ill-posed. Bayesian techniques circumvent this problem by using a prior distribution on the 
compressible vector as a regularization, and computing the corresponding posterior estimate. To incorpo- 
rate a compressible prior in (1) and (2), SBL uses a two-stage hierarchical model on the unknown vector, 
as shown in Fig. 1. Here, x ~ A/"(0, T), where the diagonal matrix T contains the hyperparameters 
7 = (71, . . . ,7i) as its diagonal elements. Further, an Inverse Gamma (IG) hyperprior is assumed for 7 
itself, because it leads to a Student-^ prior on the vector x, which is known to be compressible [4] 1 . In 
scenarios where the noise variance is unknown and random, an IG prior is used for the distribution of 
the noise variance as well. For the system model in (2), every compressible vector ~ AA(0, T), i.e., 
the M compressible vectors are governed by a common T. 

It is well known that the Cramer Rao Lower Bound (CRLB) provides a fundamental limit on the 
MSE performance of unbiased estimators [19] for deterministic parameter estimation. For the estimation 
problem in SBL, an analogous bound known as the Bayesian Cramer Rao Bound (BCRB) is used to 
obtain lower bounds [20], by incorporating the prior distribution on the unknown vector. If the unknown 

'The IG hyperprior is conjugate to the Gaussian pdf [4]. 
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y/x ~ jV($x, ct 2 Inxn) 



Fig. 1. Graphical model for SBL: Two stage hierarchical model with the compressible vector taking a conditional Gaussian 
distribution and the hyperparameters taking an Inverse Gamma distribution. The noise is modeled as white Gaussian distributed, 
with the noise variance modeled as deterministic/random and known or unknown. 




Fig. 2. Summary of the lower bounds derived in this work when noise variance is assumed to be known. 



vector consists of both deterministic and random components, Hybrid Cramer Rao Bounds (HCRB) are 
derived [21]. 

In SBL, the unknown vector estimation problem can also be viewed as a problem involving nuisance 
parameters. Since the assumed hyperpriors are conjugate to the Gaussian likelihood, the marginalized 
distributions have a closed form and the Marginalized Cramer Rao Bounds (MCRB) [22] can be derived. 
For example, in the SBL hyperparameter estimation problem, x itself can be considered a nuisance 
variable and marginalized from the joint distribution, PY,x/-y(y; x /7)> to obtain the marginalized log- 
likelihood as 

f -(log|£J +y T £- 1 y) 
L(7) = log J p Y ,x/r(y,x/7)dx = y - , (3) 

where E„ = a 2 I + *T* T [23]. 

The goal of this paper is to derive Cramer Rao type lower bounds on the MSE performance of estimators 
based on the SBL framework. Our contributions are as follows: 
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• Under the assumption of known noise variance, we derive the HCRB and the BCRB for the unknown 
vector 6 = [x, 7], as indicated in the left half of Fig. 2. 

• When the noise variance is known, we marginalize the nuisance variables (7 or x) to obtain a closed 
form pdf, from which we derive the MCRB, as indicated in the right half of Fig. 2. Since the MCRB 
is a function of the parameters of the hyperprior, it yields insights into the relationship between the 
MSE performance of the estimators and compressibility of x. 

• In the unknown noise variance case, we derive the BCRB, HCRB and MCRB for the unknown 
vector 6 = [x, 7,c 2 ], as indicated in Fig. 3. 

• We derive the MCRB for a general parametric form of the compressible prior [3] and deduce lower 
bounds for two (Student-t and Generalized double Pareto) of the well-known compressible priors. 

• We extend the BCRB, HCRB and MCRBs to the MMV-SBL model given in (2). 

Through numerical simulations, we show that the MCRB on the compressible vector x is the tightest 
lower bound, and that the MSE performance of the EM algorithm achieves this bound in certain scenarios. 
The techniques used to derive the bounds can be extended to handle different compressible prior pdfs 
used in literature [2]. These results provide a convenient and easy-to-compute benchmark for comparing 
the performance of the existing estimators, and in some cases, for establishing their optimality in terms 
of the MSE performance. 



7: deterministic 7: random 7: deterministic 

x: random x: random x: marginalized 

a 2 : random a 2 : random ff 2 : deterministic 

HCRBfromp(y,x,(j 2 ;7) BCRB fromp(y, x, a 2 . 7) MCRB from p(y; 7, a 2 ) 



Fig. 3. Different modeling assumptions and the corresponding bounds derived in this work when noise variance is assumed to 
be unknown. 



The rest of this paper is organized as follows. In Section II, we provide the basic definitions and 
describe the problem set up. In Sections III and IV, we derive the lower bounds for the cases shown 
in Figs. 2 and 3, respectively. The bounds are extended to the MMV-SBL signal model in Section V. 
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The efficacy of the lower bounds is graphically illustrated through simulation results in Section VI. We 
provide some concluding remarks in Section VII. In the Appendix, we provide proofs for the Propositions 
and Theorems stated in the paper. 

Notation: In the sequel, boldface small letters denote vectors and boldface capital letters denote 
matrices. The symbols (-) T and | • | denote the transpose and determinant of a matrix, respectively. 
The empty set is represented by 0, and T(-) denotes the Gamma function. The function px(%) denotes 
the pdf of the random variable X evaluated at its realization x. Also, diag(a) stands for a diagonal matrix 
with entries on the diagonal given by the vector a. The symbol V# denotes the gradient w.r.t. the vector 
9. The expectation with respect to a random variable X is denoted as Kx (•)• Also, if A y B, A — B 
is a positive semidefmite matrix and A ® B gives the Kronecker product of the two matrices A and B. 



As a precursor to the sections that follow, we define the MSE matrix and the Fisher Information Matrix 
(FIM) [19], and state the assumptions under which we derive the lower bounds in this paper. Consider a 
general estimation problem where the unknown vector 9 € M. n can be split into sub-vectors 9 = [9 r , Qj\, 
where 9 r € W n consists of random parameters distributed according to a known pdf, and 9 a 6 M n_m 
consists of unknown deterministic parameters. Let 9{y) denote the estimator of 9 as a function of the 
observations y. The MSE matrix E e is defined as 



where r denotes the random parameters to be estimated (whose realization is given by 9 r ). The first 
step in obtaining Cramer Rao type lower bounds is to derive the FIM I [19]. Typically, I is expressed 
in terms of the individual blocks of submatrices, where the (ij) block is given by 



In this paper, we use the notation I to represent the FIM under the different modeling assumptions. For 
example, when 9 r ^ and 9^ 7^ 0, I represents a Hybrid Information Matrix (HIM). When 9 r ^ and 
Od = 0, I 9 represents a Bayesian Information matrix (BIM). Assuming that the MSE matrix E e exists 
and the FIM is non-singular, a lower bound on the MSE matrix E e is given by the inverse of the FIM: 



II. Preliminaries 



E e ^E Y , Qr (9 - %))(0 - 9(y)f 



(4) 




-E Y ,e r [V fli V£. log p Yl e r; e d (y, 9 r ;9 d )}. 



(5) 




(6) 
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It is easy to verify that the underlying pdfs considered in the SBL model satisfy the regularity conditions 
required for computing the FIM (see Sec. 5.2.3 in [22]). 

We conclude this section by making one useful observation about the FIM in the SBL problem. An 
assumption in the SMV-SBL framework is that x and n are independent of each other (for the MMV-SBL 
model, T and W are independent). This assumption is reflected in the graphical model in Fig. 1, where 
the compressible vector x (and its attribute 7) and the noise component n (and its attribute a 2 ) are on 
unconnected branches. Due to this, a submatrix of the FIM is of the form 



where there are no terms in which both 7 and £ = a 2 are jointly present. Hence, the corresponding terms 
in the above mentioned submatrix are always zero. This is formally stated in the following Lemma. 

Lemma 1: When Q t = 7 and Oj = a 2 , the ij th block matrix of the information matrix l e , given by 
(5), simplifies to if = 0, i.e., to an all zero matrix. 



In this section, we derive lower bounds for the system model in (1) for the scenarios in Fig. 2, where 
the unknown vector is 6 = [x, 7]. We examine different modeling assumptions on 7 and derive the 
corresponding lower bounds. 

A. Bounds f rom the Joint pdf 

1) HCRB for 9 = [x, 7]: In this subsection, we consider the unknown variables as a hybrid of a 
deterministic vector 7 and a random vector x distributed according to a Gaussian distribution conditioned 
on the unknown parameter 7. Using the assumptions and notation in the previous section, we obtain the 
following proposition. 

Proposition 1: For the signal model in (1), the HCRB on the MSE matrix E 6 * of an unknown vector 

6 = [x, 7], where the conditional distribution of the unknown compressible signal x/7 is A^(0, T) and 

7 is modeled as an unknown deterministic parameter, is given by E e >z (H e ) _1 , where 



Ex,Y,r,H [V 7 V € {logpY/x,s(y/x,0 +l°gPx,r(x,7) + logps(0}] ' 



(V) 



III. SMV-SBL: Lower Bounds when o~ 2 is Known 



H e (x) 
(H e (x, 7 )) T 




1 



(8) 
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Proof: See Appendix A. 

For underdetermined problems, the lower bound on the estimate of x depends on the prior information 
through the diagonal matrix T. In the SBL problem, the realization of the random parameter 7 has to 
be used to compute the bound above, and hence, it is referred to as an online bound. Note that the lower 



bound on the MSE matrix of x is 



+ T 1 ) , which is same as the lower bound on the 



error covariance of the Bayes vector estimator for a linear model (see Theorems 10.2 and 10.3 in [19]) 

and is achievable by the MMSE estimator when T = diag(7i, . . . , jl) is known. 

2) BCRB for = [x, 7]: In general, when the hyperparameters are deterministic, the resulting 

distribution of x does not correspond to a compressible prior. Hence, it is necessary to consider a 

hyperprior distribution on 7, to ensure that the resulting x is drawn from a compressible prior distribution. 

The most commonly used hyperprior distribution in literature is the IG distribution [4], where 7, is 

distributed as 1Q ( — , — - J , given by 
V 2 2A / 



Prhi 



Y[ V - 
2 



Jl 2 



-2A7 7i [ 2A 7l 

Using the definitions and notation in the previous section, we state the following proposition. 

Proposition 2: For the signal model in (1), the BCRB on the MSE matrix E e of an unknown random 

vector 6 = [x, 7], where the conditional distribution of the unknown compressible signal x/7 is Af(0, T), 

the hyperprior distribution on 7 is W^ =1 TQ ( — , ttt-J, is given by E e >z (B e ) _1 , where 

V 2 2X J 



exp 



7iG(0,oo), i/,A>0. 



(9) 



B e (x) B e (x, 7 ) 
(B e (x, 7 )) T B e ( 7 ) 



+ T _i 







LxL 







LxL 



A 2 (i/ + !)(!/ + 7). 
2u 



1-LxL 



(10) 



Proof: See Appendix B. 

It can be seen from B e that the lower bound on the MSE of 7(y) is a function of the parameters of 
the IG prior on the vector 7, i.e., a function of v and A, and it can be computed without the knowledge 
of realization of 7. Thus, it is an offline bound. 

B. Bounds from Marginalized Distributions 

1) MCRB for 6 = [7]: Here, we derive the MCRB for 6 = [7], where 7 is an unknown deterministic 
parameter. This requires the marginalized distribution PY;-y(y;7)> which is obtained by considering the 
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unknown compressible vector x as a nuisance variable and marginalizing it out of the joint distribution 
Px,Y ; -y(x, y; 7) to obtain (3). Since 7 is a deterministic parameter, the pdf PY;-y(y;7) must satisfy the 
regularity condition in [19]. Using the definitions and notations of the previous sections, we state the 
following theorem to obtain the MCRB. 

Theorem 1: For the signal model in (1), the log likelihood function logpY;-y(y; 7) satisfies the regu- 
larity conditions [19]. Further, the MCRB on the MSE matrix E 7 of the unknown deterministic vector 
6 = [7] is given by E 7 y (M 7 ) -1 , where the ij th element of M 7 is given by 

M7. = i^jE- 1 ^) 2 , (11) 

for 1 < i,j < L, where $j is the i th column of and ~S y = ct 2 InxN + 3>T3> T , as defined earlier. 
Proof: See Appendix C. 

To intuitively understand (11), we consider a special case of <& T <i> = NI^ x n, and use the Woodbury's 

identity to simplify S" 1 , to obtain the (ii) th entry of the matrix M 7 as, 

(2 \ — ^ 
^+7iJ ■ (12) 

Hence, the error in 7, is bounded as > 2 + 7$ j . As N — > 00, the bound reduces to 27?, which 
is same as the lower bound on the estimate of 7 obtained as the lower-right submatrix in (8). For finite 
N, the MCRB is tighter than the HCRB. 

As mentioned earlier, the deterministic assumption on 7 does not lead to a compressible prior on x. 
Hence, we next assume a hyperprior on 7 and derive the MCRB on x. 

2) MCRB for 6 = [x]: In this subsection, we assume a hyperprior on 7, which leads to a joint 
distribution of x and 7, from which 7 can be marginalized. Further, assuming specific forms for the 
hyperprior distribution can lead to a compressible prior on x. For example, assuming an IG hyperprior 
on 7 leads to an x with a Student-t distribution. In [2], the authors show that sampling from a Student-t 
distribution with parameters v and A results in a z/-compressible x. The Student-t prior is given by 

-M^)W /2 n( 1+ ^f +1 ' ,2; - e <— >■ < 13) 

where v represents the number of degrees of freedom and A represents the inverse variance of the 
distribution. Using the notation developed so far, we state the following theorem. 
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Theorem 2: For the signal model in (1), the MCRB on the MSE matrix E x of the unknown com- 
pressible random vector 6 = [x] distributed as (13), is given by E x y (M x ) _1 , where 

M-^ + ^., (14, 

Proof: See Appendix D. 

We see that the bound derived depends on the parameters of the Students pdf. From [3], the prior 
is somewhat compressible for 2 < u < 4, and (14) is nonnegative and bounded for 2 < v < 4, i.e., 
the bound is meaningful in the range of v used in practice. Note that by choosing A to be large (or the 
variance of x to be small), the bound is dominated by the prior information, rather than the information 
from the observations, as expected in Bayesian bounds [19]. Also, unlike the remark in [22], we show 
in the section on numerical results that the MCRB (14) is not necessarily always tighter than the BCRB 
(8) for the SBL problem. 

The techniques used to derive the bounds in this subsection can be applied to any family of compressible 
distributions. In [3], the authors propose a parametric form of the Generalized Compressible Prior (GCP) 
and prove that such a prior is compressible for certain values of v. In the following subsection, we derive 
the MCRB for the GCP. 

C. General Marginalized Bounds 

In this subsection, we derive MCRBs for the parametric form of the GCP. The GCP encompasses the 
double Pareto shrinkage type prior [24] and the Student-i prior (13) as its special cases. We consider the 
GCP on x as follows 

px(x) = K L Y[ f 1 + ; Xi 6 (-00,00), t,u,X>0, (15) 

i=l ^ ' 

(A \ ^ T rffy -\- i)/t) 
— I — — — — - — j—r. When r = 2, the above distribution 
v) r(l/r)r(i//r) 

reduces to the Student-i prior as given in (13), and when r = 1, it reduces to a generalized double Pareto 
shrinkage prior [24]. Note that the expression for the GCP in [3] can be obtained from (15) by setting 
A = 1, and defining v = s — 1. The following theorem provides the MCRB for the GCP. 

Theorem 3: For the signal model in (1), the MCRB on the MSE matrix of the unknown random 
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vector 6 = [x], where x is distributed by a GCP in (15) is given by y (M^) , where 



where T T 



r 2 (z/+l) fX 



2/r 



r 



v + 2 



(16) 



r 2 - - 

T J \ T 



'7 K 



(i/ + r + l) V^, 
Proof: See Appendix E. 

It is straightforward to verify that for r = 2, (16) reduces to the MCRB derived in (14) for the Student-t 
distribution. For r = 1, the inverse of the MCRB can be reduced to 

A> + 1) 2 



M" = h 

a 2 i/(" + 2) 



i-LxL- 



(17) 



Hence, this method is useful in obtaining a Cramer Rao type lower bound for the estimators based on 
the double Pareto shrinkage prior, which uses the generalized prior with r = 1 [24], [25]. 

Further, we plot the expression (16) in Fig. 4 and observe that, in general, the bounds predict an 
increase in MSE for higher values of r. Also, the lower bounds at different signal to noise ratios (SNR) 
converge as the value of r increases at a given value of AT, indicating that increasing r renders the bound 
insensitive to the SNR. The lower bounds also predict a smaller value of MSE for a lower value of v. 




10"' 
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Fig. 4. The MCRB (16) for the parametric form of the GCP as a function of r, for different values of u, N and £ = a 
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Thus far, we presented the lower bounds on the MSE in estimating the unknown parameters of the 
SBL problem when the noise variance is known. In the next section, we extend the results to the case 
of unknown noise variance. 

IV. SMV-SBL: LOWER BOUNDS WHEN a 2 IS UNKNOWN 

Let us denote the unknown noise variance as £ = a 2 . In the Bayesian formulation, the unknown 
noise variance is associated with a prior, and since the IG prior is conjugate to the Gaussian likelihood 
PY/x,s(y/ x j £)> ft i s assumed that a 2 ~ lQ(c,d) [4], i.e., £ = a 2 is distributed as 

p s (0 = 4^ ( " c " 1)ex P<f-7); £e(0,oo), c,d>0. (18) 



r(cr I £ 

Under this assumption, one can marginalize the unknown noise variance and obtain the marginalized 
likelihood p(y/x) as, 

P(y/x) = £%(y,e/xR = (2 r(^('/t ° ] ((y - * X ) T (^ - * x ) + 2d )" (f +c) > ( 19 ) 

which is a multivariate Student-i distribution. It turns out that the straightforward approach of using the 
above multivariate likelihood to directly compute lower bounds for the various cases given in the previous 
section is analytically intractable, and that the lower bounds cannot be computed in closed form. Hence, 
we compute lower bounds from the joint pdf, i.e., we derive the HCRB and BCRBs for the unknown 
vector 6 = [x, 7, £] with the MSE mattix E| defined by (4) 2 . Using the assumptions and notation from 
the previous sections, we obtain the following proposition. 

Proposition 3: For the signal model in (1), the HCRB on the MSE matrix E| of the unknown vector 
= [x,7,£], with the conditional distribution of the unknown compressible vector x/7 being M(0, T), 

6' 

and £ modeled as an unknown deterministic parameter, is given by (H?) _1 , where 



H? 



H e ' Lxl 

N 

OlxL 7T7T 



(20) 



2£ 2 

In the above expression, with a slight abuse of notation, H e is the FIM given by (8). 
Proof: See Appendix F. 

2 We use the subscript £ to indicate that the error matrices and bounds are obtained for the case of unknown noise variance. 
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The lower bound on the estimation of £ matches with the well-known lower bounds on noise variance 
estimation (see Sec. 3.5 in [19]). One disadvantage of such a bound on £(y) is that the knowledge of 
the noise variance is essential to compute the bound, and hence, it cannot be computed offline. Instead, 
assigning a hyperprior to £ would result in a lower bound that only depends on the parameters of the 
hyperprior, which are assumed to be known, allowing the bound to be computed offline. We state the 
following proposition in this context. 

Proposition 4: For the signal model in (1), the HCRB on the MSE matrix E| of the unknown random 
vector 6 = [x, 7,£], with the conditional distribution of the unknown compressible vector x/7 being 



e 1 

AA(0, T), 7 modeled as an unknown deterministic or random parameter, and the unknown random 
parameter £ distributed as lQ(c,d), is given by (H?) _1 , where 



Hf 



H e Lxl 

c(c+l)(iV/2 + c + 3) 

OlxL J 2 



(21) 



In the above expression, H is the FIM given in (8). 
Proof: See Appendix G. 

In SBL problems, a non-informative prior on £ is typically preferred, i.e., the distribution of the noise 
variance is modeled to be as flat as possible. In [4], it was observed that the non-informative prior 
is obtained when c, d — > 0. However, we see that for these values of c and d, the bound in (21) is 
indeterminate. In the section on numerical results, we illustrate the performance of the lower bound in 
(21) for practical values of c and d. 

A. Marginalized Bounds 

In this subsection, we obtain lower bounds on the MSE of the estimator £(y), in the presence of 
nuisance variables in the joint distribution. To start with, we consider the marginalized distributions of 
7 and £, i.e., j>Y;-y,g(y; 7, where both, 7 and £ are deterministic variables. Since the unknowns are 
deterministic, the regularity condition has to be satisfied for 6 = [7, £]. We state the following theorem. 

Theorem 4: For the signal model in (1), the log likelihood function logpY;7,g(y; 7, £) satisfies the 
regularity condition [19]. Further, the MCRB on the MSE matrix E? of the unknown deterministic vector 
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= [7,£] is given by E| ^ (M|) _1 , where 

Mf( 7 ) Mf( 7 ,0 
Mf(C, 7 ) Mf(e) 



MS 



(22) 



where the fj*' 1 entry of the matrix M|( 7 ) is given by (M|( 7 ))i_j = - {(^JE," 1 ^) 2 }, M| = 

1 <J> T S" 2 $i 
-Tr(£" 2 ). Further, the entry of the vectors (M?( 7 ,£))i = (M?(£, 7 ))i = — — y - . 

2 y ? ? 2 

Proof: See Appendix H 

Remark: From the graphical model in Fig. 1, it can be seen that the branches consisting of 7^ and £ 
are independent conditioned on x. However, when x is marginalized, the nodes £ and 7^ are connected, 
and hence, Lemma 2 is no longer valid. Due to this, the lower bound on 7 depends on £ and vice versa, 
i.e., M|( 7 ) and M|(£) depend on both £ and T = diag(7) through E y = {l NxN + *T* T . 

Thus far, we have presented several bounds for the MSE performance of the estimators x(y), 7 (y) 
and £(y) in the SMV-SBL framework. In the next section, we derive Cramer Rao type lower bounds for 
the MMV-SBL signal model. 

V. Lower Bounds for the MMV-SBL 

In this section, we provide Cramer Rao type lower bounds for the estimation of unknown parameters 
in the MMV-SBL model given in (2). We consider the estimation of the compressible vector w from the 
vector of observations t, which contain the stacked columns of W and T, respectively. In the MMV- 
SBL model, each column of W is distributed as Wj/ 7 ~ A/"(0, T), for i = 1, . . . M, and the likelihood 
is given by p T /w,s( t / w ^) = Ili^iPT/w.sCti/wi, £), where p T/W)H (ti/w*, f) = Af(&w h £) and 
£ = cr 2 . The modeling assumptions on 7 and £ remain the same as in the SMV-SBL case, given by (9) 
and (18), respectively [18]. 

Using the notation developed in Section II, we derive the bounds for the MMV SBL case similar to 
the SMV-SBL cases considered in Sections III and IV. Since the derivation of these bounds follow along 
the same lines as the previous sections, we simply state the different lower bounds in the following table. 

We see that the lower bounds on w(y), 7(y) and £(y) are reduced by a factor of M compared to 
the SMV case. This is intuitively satisfying, since a higher number of observations are available for the 
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SI. No. 


Bound Derived 


Expression 


1 


HCRB on -y(y) 


H e M = diag( ^-2 J fori = l,2...,L 


2 


BCRB on -y(y) 


De _ A> + 2)(M + v + 6) T . 


3 


MCRB on ~/(v) 


M?, - fM e l where M e - — r$ T S _1 $ ) 2 


4 


BCRB on w(y) 


H « = M (^5^ + T_1 ) ® ImxM 


5 


HCRB on i(y) 








c(^— +c + 3j (c+1) 


6 


BCRB on £(y) 




7 


MCRB on [ 7 (y), £(y)] 


M^ / 5 = M x Mf 



TABLE I 

Cramer Rao Type Bounds for the MMV-SBL Case. 



estimation of the parameters w, 7 and £. It turns out that it is not possible to obtain the MCRB on w 
in the MMV-SBL setting, since closed form expressions for the FIM are not available. 

In the next section, we consider two algorithms for SBL, namely the EM algorithm and the ARD 
based reweighted l\ algorithm, and numerically illustrate the efficacy of the lower bounds derived so far. 

VI. Simulations and Discussion 

The vector estimation problem in the SBL framework typically involves the joint estimation of the 
hyperparameter and the unknown compressible vector x. Since the hyperparameter estimation problem 
cannot be solved in closed form, iterative estimators are employed [4]. In this section, we consider 
the iterative updates based on the EM algorithm first proposed in [4]. We also consider the algorithm 
proposed in [6] based on the Automatic Relevance Determination (ARD) framework and compare the 
performance of the these estimators against the derived lower bounds. We quantify the performance of 
the lower bounds to benchmark the MSE accuracy of the EM algorithm (we label it as EM) and the 
ARD based Reweighted l\ algorithm (we label it as ARD-SBL) for the linear model considered in (1) 
and (2), for estimating x, 7 and £. 

We simulate the lower bounds for a random underdetermined (N < L) measurement matrix <&, whose 
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10° 



v = 2.1 

v = 2.05 




200 400 600 800 1000 1200 

Length of the Compressible Vector L 



Fig. 5. Decay profile of the sorted magnitudes of i.i.d. samples drawn from a Student-i distribution. 

entries are i.i.d. and standard Bernoulli ({+1, —1}) distributed. A compressible signal of dimension L is 
generated by sampling from a Student-i distribution with the value of v ranging from 2.01 to 2.05, which 
is the range in which the signal is somewhat compressible, for high dimensional signals [3]. Figure 5 
shows the decay profile of the sorted magnitudes of L = 1024 i.i.d. samples drawn from a Student-t 
distribution for different degrees of freedom and with the value of E(x|) fixed at 10~ 3 . 

A. Lower Bounds on the MSE Performance of x(y) 

In this subsection, we compare the MSE performance of the ARD-SBL estimator and the EM based 
estimator x(y). Figure 6 depicts the MSE performance of x(y) for different SNR and ./V = 750 and 1000, 
with v = 2.01. We compare it with the HCRB/BCRB derived in (8), which is obtained by assuming the 
knowledge of the realization of the hyperparameters 7. We see that the MCRB derived in (14) is a tight 
lower bound on the MSE performance at high SNR and N. At low SNR, the BCRB turns out to be a 

tighter lower bound and a cross-over is observed. Also, at low SNR, since the noise variance is high, the 

AO + 1) (f + 3) 

MCRB in (14) is dominated by the term r^LxL- Hence, the MCRB floors at lA J - = 0.0085, 

(V + 3) A(u + l) 

which matches the value in Fig. 6 at SNR = lOdB. 
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ARD-SBL, N 

— *— EM, N = 750 
-©- MCRB, N = 750 
— B— BCRB, N = 750 

ARD-SBL, N = 1000 

-*- EM, N = 1000 
-O- MCRB, N = 1000 
-O- BCRB, N = 1000 



25 
SNR 



Fig. 6. Plot of the MSE performance of x(y), the corresponding MCRB and BCRB as a function of SNR, where v — 2.01. 




ARD-SBL, N = 1000 

EM, N = 1000 
■O MCRB. N = 1000 
-Q- BCRB, N = 1000 



2 2.05 2.1 2.15 2.2 2.25 2.3 

Degrees of Freedom v 



Fig. 7. Plot of the MSE performance of x(y), the corresponding MCRB and BCRB as a function of v, where SNR = 40dB. 
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ARD-SBL, v = 2.05 

: — *— EM, v = 2.05 


-©- MCRB, v = 2.05 




— H— BCRB, v = 2.05 




ARD-SBL, v = 2.01 




-*- EM, v = 2.01 




-O ' MCRB, v = 2.01 
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Fig. 8. Plot of the MSE performance of x(y), the corresponding MCRB and BCRB as a function of TV, where SNR = 40dB. 



Figure 7 shows the comparative MSE performance of the ARD-SBL estimator and EM based estimator 
and the bounds as a function of varying degrees of freedom v, at an SNR of 40dB and N = 1000 and 
750. As expected, the MSE performance of the algorithms is better at low values of v since the signal 
is more compressible, and the MCRB and BCRB also reflect this behavior. The MCRB is a very tight 
lower bound, especially for high values of N. Figure 8 shows the comparative MSE performance as a 
function of number of observations N, at an SNR of 40dB and two different values of v. The MSE 
performance of the EM algorithm converges to that of the MCRB at higher N. 

We see a cross over between the BCRB and the MCRB in Fig. 6 at lower SNR. Although it is 
conjectured in [22] that the MCRB tends to be tighter than BCRB, the cross over indicates that, for the 

estimation of compressible vector in the SBL problem, the MCRB (14) is not always tight compared to the 

/* T $ X(v + 1) \ -1 

BCRB (8) for the bounds. For the MCRB to be tighter than the BCRB, — — + -± t^Ilxl h 

\ a 2 + 3) / 

/* T * _ \ AO + 1) 
— hT 1 must be satisfied, which further implies that — ^IlxL ^ T , which is not 

V ° ) (" + 3) 

true for all instantiations of T. We further investigate this behavior in Figs. 9 and 10, where we plot 
the histogram of the difference between the bounds (MCRB — BCRB), for two different values of N, 
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Fig. 9. Histogram of (MCRB - BCRB) for critical values of N at SNR = 20dB. 



at an SNR of 20dB and 40dB. We see that, at lower number of measurements, the difference becomes 
negative with a higher probability, indicating that the MCRB is not always the tightest bound. However, 
for most practically relevant values of N and SNR, the MCRB is tighter than the BCRB, and hence it 
is a more useful bound. 

The ARD-SBL algorithm is computationally cheaper and is known to achieve faster convergence than 
the EM algorithm [6]. However, in this subsection, we have seen that EM algorithm performs better than 
the ARD-SBL algorithm in all the scenarios. Hence, we proceed to work with the EM algorithm in the 
following subsections. 

B. Lower Bounds on the MSE Performance of~f(y) 

In this subsection, we compare the different lower bounds for the MSE of the estimator 7(y) for the 
SMV and MMV-SBL system model. Figure 11 shows the MSE performance of 7(2/) as a function of 
SNR and M, when 7 is a random parameter, N = 1000 and v = 2.01. In this case, it turns out that 
there is a large gap between the performance of EM and the lower bound. 

When 7 is deterministic, we first note that the EM based ML estimator for 7 is asymptotically optimal 
and the lower bounds are practical for large data samples [19]. The results are listed in Table II. We see 
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Fig. 10. Histogram of (MCRB - BCRB) for critical values of N at SNR = 40dB. 
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Fig. 11. Plot of the MSE performance of 'y(y) and the corresponding HCRB as a function of SNR, where N = 1000. 
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that for L = 2048 and N = 1500, the MCRB and BCRB are tight lower bounds, with MCRB being 
marginally tighter than the BCRB. However, as M increases, the gap between the MSE and the lower 
bounds increases. 





SNR(dB) 




10 


20 


30 


40 






MSE 


0.05429 


0.05270 


0.05132 


0.04983 




M = 1 


MCRB 


0.05218 


0.05134 


0.05070 


0.04921 


Deterministic 




BCRB 


0.04880 


0.04880 


0.04880 


0.04880 


hyperparameter -y 




MSE 


0.04500 


0.03923 


0.03476 


0.03030 




M = 50 


MCRB 


0.0012 


0.0011 


0.0010 


0.0009 






BCRB 


9.766 x 10" 4 


9.766 x 10" 4 


9.766 x 10" 4 


9.766 x 10" 4 



TABLE II 

MSE OF THE ESTIMATOR 7(y), THE MCRB, AND THE BCRB, AS A FUNCTION OF SNR FOR N = 1500. 



C. Lower Bounds on the MSE Performance of £(y) 

In this subsection, we compare the lower bounds on the MSE of the estimator £(y) in the SMV and 
MMV-SBL setting. Figure 13 shows the MSE performance of £(y) and the corresponding HCRB for 
different values of N and M. Here, £ is sampled from the IG pdf (18), illustrated in Fig. 12. 

When £ is deterministic, the EM based ML estimator for 7 is asymptotically optimal and the lower 
bounds are practical for large data samples [19]. Table III lists the MSE values of £(y), the corresponding 
HCRB and MCRB for deterministic but unknown noise variance, while the true noise variance is fixed 
as 10" 3 . We see that We see that for L = 2048 and TV = 1500, the MCRB is a tighter bound compared 
to the HCRB at different values of M. However, when the noise variance is random, we see from Fig. 13 
that a gap exists between the MSE performance and the HCRB. 

VII. Conclusion 

In this work, we have derived Cramer Rao type lower bounds on the MSE, namely, the HCRB, BCRB 
and MCRBs for the SMV and MMV-SBL framework for estimating compressible signals. We used 
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Fig. 12. PDF of £ indicating the most probable values of the random parameter £ for c = 3 and d = 0.2. 





AT 




1500 


1600 


1700 


1800 






MSE 


0.7362 x 10" 8 


0.6633 x 10" 8 


0.6360 x 10" 8 


0.5924 x 10" 8 




M = 1 


MCRB 


0.3796 x 10 -8 


0.3401 x 10" 8 


0.3071 x 10" 8 


0.2792 x 10~ 8 


Deterministic noise 




HCRB 


0.1333 x 10 -8 


0.1250 x 10" 8 


0.1176 x 10" 8 


0.1111 x 10" 8 


variance £ 




MSE 


0.9304 x 10" 9 


0.8915 x 10" 9 


0.8661 x 10" 9 


0.8466 x 10" 9 




M = 50 


MCRB 


0.6803 x 10~ 10 


0.6524 x 10~ 10 


0.6142 x 10" 10 


0.5732 x 10" 10 






HCRB 


0.2666 x 10~ 10 


0.2500 x 10~ 10 


0.2352 x 10~ 10 


0.2222 x 10 -1 ° 



TABLE III 

MSE OF THE ESTIMATOR |(y), THE MCRB, AND THE HCRB, AS A FUNCTION OF N. 



the hierarchical models for the compressible priors employed in SBL, to derive bounds under various 
assumptions on the unknown parameters. The bounds obtained by assuming the hyperprior distribution 
on the hyperparameters provided key insights into the MSE performance of SBL and the values of 
the parameters that govern these hyperpriors. We derived the MCRB for the generalized compressible 
prior distribution, which in turn lead to lower bounds when a Student-t and Generalized Pareto prior 
distribution is assumed on x. We verified the lower bounds numerically against the MSE performance 
of the ARD-SBL and the EM algorithm, using Monte-Carlo simulations. From the numerical results, we 

February 7, 2012 DRAFT 



23 




— #— EM, M = 1 
■ -*- ■ EM, M = 50 
— H— HCRB, M = 1 
-O- ■ HCRB, M = 50 



600 650 700 750 800 850 900 950 1000 

Number of Observations N 



Fig. 13. Plot of MSE performance of "y(y) along with the HCRB as a function of N. 



saw that the MCRB is tighter than the BCRB for high N and SNR. An interesting contribution of this 
work is the illustration of the optimality of EM based updates for SBL, since the MSE of the algorithm 
coincides with that of the MCRB, as seen from the simulations. 

Appendix 

A. Proof of Proposition 1 

Using the graphical model of Fig. 1 in (5), H e (x) is computed as 

H e (x) 4 -E Y> x; T [v2logp Y ,x; 7 (y,x; 7 )] 



-E 



Y,X;-y 



v x ( * T( ^* x) -r-ix) 



+ T" 1 . (23) 



Similarly, it is straightforward to show that, V x V-y logpv x--y(y, x ; 7) = diag I — k, — — o- ). 
Since X{ are zero mean random variables, 

H e ( 7 ,x) = -E Y! x; 7 (V T V x logp Y ,X; 7 (y,x;7)) =0 LxL . (24) 

Further, 

H e ( 7 ) 4 -E Y;X;7 [V* (logpY /x (y/x) + logp X ; 7 (x; 7))] • (25) 
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Now, since logp X ; T (x; 7) = £). =1 logp X;7 (xi; 7*), we get, 

<9 2 logp X;7 (x;7) 



#7i7i 



1 s? ., . . 
27?-? lf? = J 



(26) 



if i ^ j. 

Taking -Ex :7 (-) on both the sides of the equation above and noting that Ex ;7 (i') = 7i, we obtain 



H y ( 7 ) = diag ^-Ex; T 
This completes the proof. 



d 2 logpx ;7 (x;7) 



diag 



1 



1 



L27i 2 '""2 7 I 



(27) 



B. Proof of Proposition 2 

First, note that the expressions for H e (x) and B e (x) are the same, and hence, B e (x) remains the 
same as in (23). From the graphical model of Fig. 1 and (5), B e (7) is defined for 6 = [x, 7] as 



B e ( 7 ) 4 -E Y ,x,r [V 2 (logpY/x(y/x) +logp x /r(x/7) + logpr(7))] • 



(28) 



Since the expressions for logp x /r( x /7) an d logpr(7) we separable and symmetric w.r.t. each 7$, the 

off-diagonal terms of B e (7) are zero, and it is sufficient to evaluate the diagonal terms 

' d 2 (log px/r (x/7) + log Pt{i)Y 



-E 



Y,x,r 



. Differentiating the expression w.r.t. each 7^ twice, 



d 2 (logp x /r(x/7) + logpr(7)) 



The expression for — Ep 
"(" + 1) v 



Ei 



where K~ 



n 



A7, d 



= if. 



+ 



H 




V 


is 






+ 1 



[V + l) V 

+ 



27? A7l 



3 ' 



(29) 



7i =0 



is given by 



1 -2 " — 3 
■7< " ^7, 



7 



exp 



2A 7i 



(30) 



7 V2A 
reduces to 



( u_y/2 (v(v _ 



-E T 



After some manipulations, it can be shown that the above integral 

A 2 (z/ + l)(z/ + 7) 



2v 



2 7 2 A 7 f. 

Using (5), the (ij) th component of the matrix B e (7,x) is obtained as 

<9 2 logPx/r(x/7) _ Xi 



(B tf (7,x)) 



„2 ' 



and B e (x,7) = (B e (7,x)) T . Since E x /r(^i) = 0, B e (7,x) = LxL . This completes the proof. 



(31) 



(32) 
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C. Proof of Theorem 1 



To establish the regularity condition, the first order derivative of the log likelihood logpY:-y(y; 7) is 
required. This, in turn, requires the evaluation of — — ^ — — and — y — . The derivative of the log 
likelihood w.r.t. jj is obtained using the chain rule [26] as follows: 



<91og|£j 



Tr 



(33) 



Here, we have used the identity Vx log \X\ = X 1 [26], and results from vector calculus [26] to obtain 
Qj^J, where <frj is the j th column of 3>. Similarly, the derivative of y T H~ 1 y can be obtained 



a 7j 



as 



gy^V =Tr 



y 31 



(34) 



and hence, 



^-logp Y;7 (y;7) 



Taking Ey ;7 ( i ) on both the sides of the above equation, it can be seen that 



IE- 



Y;-y 



^-logp Y;7 (y;7) 



QjV-^ - SjE" 1 {E Yn (yy T )} 



(35) 



0, (36) 



since Ev(yy T ) = E y . Hence, the pdf satisfies the required regularity constraint. 

Using the regularity condition above, the MCRB for = [7] is obtained by computing the second 
derivative of the log likelihood as follows 



Taking -Ey ;7 (-) on both the sides of the above expression, we obtain 



(JVT) 



'J 



-E- 



Y;-y 



<9 2 logp Y ; 7 (y;7) 



(37) 
(38) 
(39) 

(40) 



as stated in (11). This completes the proof. 
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D. Proof of Theorem 2 

The proof follows from the proof for Theorem 3 in Appendix H by substituting r = 2. 



E. Proof of Theorem 3 

The MCRB for estimation of the compressible random vector with 6 = [x] is given by 

M x = -E Y , X [V2 logp Y , x (y,x)] = -E Y ,x[V^logp Y/x (y/x) + V^logpx(x)]. 

The first term above is given by 

3> T (y - *x) 



-E Y ,x [V x logp Y/x (y/x)] 



-E 



Y.X 



V x - 



-E- 



Y.X 



(7 



<7 



(41) 



(42) 



Note that px( x ) is not differentiable if any of its components = 0. However, the measure of Xj = 
is zero since the distribution is continuous, and hence, this condition can be safely ignored. Since the 
expression for the prior of x, is independent of Xj for all j, and symmetric in all Xj, V x logpx(x) is 
obtained by differentiating w.r.t. to the individual 



_d_ 

dxi 



(i/ + l)A 



-r-l 



logpx(x) = < 



XxJ 



("I) 



(^+1)A 



-T-l 



1 + 



Ax! 



if > 



if Xi < 0. 



(43) 



First, we consider the case of Xi > 0. Differentiating the above w.r.t. Xi again, we obtain 



_d 

dx 



2 logpx(x) 



.T-2 



1 + 



+ 



A 2 r(^ + 1) 



„2r-2 



Xxl 



(44) 



Taking — E x (-) on both sides of the above equation, we get 



d 

-Ex ( ^logpxlx) 



K(y + l)X 



{r-l)xl 



T-2 



Arx? T " 2 



T x (^+2r+l)/r 



(45) 
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\x T 

The above can be simplified using the transformation ti = — - and using 

v 

nn 

t u - x , r(u)I» 

-dt = A , we get 



(i + r(u + w) 



a ,„ «-(" + i)(r-i)/'A\ 1/T I ,/ n fr(^)-ir(^) 



Exl^iogpxfx);-^-^ ^-j r^i t/1 rfH ^ i; 

for Xi > 0. (46) 

For the case of Xi < 0, we see that the integral simplifies to the integral given in (45). Hence, the overall 
expression is given by 

/ d , , A K(u + \) 2 (t-1) f\\ 1/r ( 1\ f r(^) ) 

- Ex (^ logPxW )= L_^ (-) r(i--){ FT ^ y ). (47) 

Substituting the expression for in the above, we get 

/a , A r 2 (z/ + l) /a\ 2/t r(^)r(2-i) 

Combining the expression above and (42), we obtain the overall MCRB as given in (17). 

F. Proof of Proposition 3 

In this case, we define 0' = [x, 7] and hence, 6 = [0',£]. In order to compute the HCRB, we 
need to find Hf(f), Hf(fl') and Hf(0',£). We have logp Y>X ; T , s (y, x; 7, = logPY/x^y/x; £) + 
logpx;7(x;7), where £ = <r 2 . Using (5), the submatrix H?(0') = H e , i.e., the same as computed earlier 
in (8). Hence, we focus on the block matrices that occur due to addition of £. First, H|(£) is computed 



as in Sec. 3.6 in [19], from which, — Ey,X;£ 



N 
2£2 



N 
2£2' 



From Lemma 2, it directly follows that H| (7, £) = Olxi- Using (5), we compute H|(x, £) as follows: 

Hf (x,0 = E x (E Y /x ;4 (* T y - * T *x)). (49) 
Since E Y /x ; g(y) = * x , E x (* T (*x) - * T *x) = Lxl . This completes the proof. 

G. Proof of Proposition 4 

In this case, we define 6 = [0',£] and 6' = [x,7]. In order to compute the HCRB, we need to 
find Hf (£), H|(0') and Hf(0',£). Using (5), we know that the expression for H|(0') is the same as 
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computed earlier in (8). Since £ is random, the expectation operator includes an expectation over £, and 
hence, 

d 2 



Hf (0 = -E Y ,x 
The above expectation is evaluated as follows 



Q£2 (logPY/X,s(y/x, + logp=(0) 



.'JV/2-c-l 2tA 

+ |aj- < 5 °) 



Bf(0 = W2 — 1)dB /r¥-)e XP {-|}„ + ^ / rt (-> 



c(c+l)(iV/2 + c + 3) 
d 2 



(51) 



To find the other components of the matrix, we compute H| (#',£) = (H|(£,0')) T , which consists of 
H|(7,£) and H|(x, 0- From Lemma 2, H|(7,0 = 0x, x i. Using the definition of H|(x, £), we see 
that H|(x, £) = (H|(£,x)) T = 0x, x i, from (49) and since ps(0 is not a function of Xj. Thus, we 
obtain the FIM given by (21). 



H. Proof of Theorem 4 

The proof of theorem 3 requires the regularity condition when the noise variance is not known. We 
have already shown that the log likelihood function satisfies the regularity condition when 6 = [7] in 
(36). In this section, we show that the log likelihood log(pY,-y,f (y, 7,0) m (3) satisfies the regularity 
condition w.r.t. £. Differentiating the log likelihood w.r.t. £ and taking — Ky.~f on both the sides of 
the equation, 

^log(p Y; ^(y>7,0) = |^(-Iog|E y | -y T E^y) = ~\ [TrfE" 1 ) - T^y^E^E" 1 ))] , (52) 



E 



Tr(-5E- 1 ) + i 1 &(yy :r (E- 1 E- 1 )) 



Hence, the regularity condition is satisfied. From (40), we have (M 
M|, we differentiate (52) w.r.t. £ to obtain 



i^E^-TrfE- 1 )] =0. 



(53) 



To obtain 



d 2 



1 



(iogPY i7l f(y;7,0) = o Tr (V) - Tr (yy (E y " d )). 



Taking — Ey;^,^-) on both sides of the above equation, 



-E 



Tr(S; 2 )-Tr(yy T Tr(I] y - 3 )) 



Tr(E^) - -Tr(E~ ) = -Tr(Ej 



(54) 



(55) 
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The vector M^(7,£) can be found by differentiating (35) w.r.t. £ and taking the negative expectation to 



obtain 



(Mf( 7 ,e))< = E Y?y , C 



9£ 



= E- 2 $,. (56) 

Since M|(£,7) = (M|(7, £)) T , the z t?l term of (M|(£,7))j = -$f£~ 2 $i. The overall expression 
for the MCRB M| can now be obtained by combining the expressions in (40), (55) and (56); and this 
completes the proof. 
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